Efficient Pruning of Probabilistic Automata

نویسندگان

  • Franck Thollard
  • Baptiste Jeudy
چکیده

Applications of probabilistic grammatical inference are limited due to time and space consuming constraints. In statistical language modeling, for example, large corpora are now available and lead to managing automata with millions of states. We propose in this article a method for pruning automata (when restricted to tree based structures) which is not only efficient (sub-quadratic) but that allows to dramatically reduce the size of the automaton with a small impact on the underlying distribution. Results are evaluated on a language modeling task.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Links between probabilistic automata and hidden Markov models: probability distributions, learning models and induction algorithms

This article presents an overview of Probabilistic Automata (PA) and discrete Hidden Markov Models (HMMs), and aims at clarifying the links between them. The first part of this work concentrates on probability distributions generated by these models. Necessary and sufficient conditions for an automaton to define a probabilistic language are detailed. It is proved that probabilistic deterministi...

متن کامل

A Link Prediction Method Based on Learning Automata in Social Networks

Nowadays, online social networks are considered as one of the most important emerging phenomena of human societies. In these networks, prediction of link by relying on the knowledge existing of the interaction between network actors provides an estimation of the probability of creation of a new relationship in future. A wide range of applications can be found for link prediction such as electro...

متن کامل

LP Distance and Equivalence of Probabilistic Automata

This paper presents an exhaustive analysis of the problem of computing the Lp distance of two probabilistic automata. It gives efficient exact and approximate algorithms for computing these distances for p even and proves the problem to be NP-hard for all odd values of p, thereby completing previously known hardness results. It further proves the hardness of approximating the Lp distance of two...

متن کامل

On the Computation of Some Standard Distances Between Probabilistic Automata

The problem of the computation of a distance between two probabilistic automata arises in a variety of statistical learning problems. This paper presents an exhaustive analysis of the problem of computing the Lp distance between two automata. We give efficient exact and approximate algorithms for computing these distances for p even and prove the problem to be NP-hard for all odd values of p, t...

متن کامل

On the Efficiency of Deciding Probabilistic Automata Weak Bisimulation

Weak probabilistic bisimulation on probabilistic automata can be decided by an algorithm that needs to check a polynomial number of linear programming problems encoding weak transitions. It is hence polynomial, but not guaranteed to be strongly polynomial. In this paper we show that for polynomial rational probabilistic automata strong polynomial complexity can be ensured. We further discuss co...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008